\[ \text{area of square} = (2r)^2 = 4r^2 \\ \text{area of circle} = \pi r^2 \\ \frac{\text{area of circle}}{ \text{area of square}} = \frac{\pi}{4} \]
I roll a die, only stop if I get same number twice in a row.
❓ What is \(\mathbb{E}(\text{# of die rolls})\)?
\[ \mathbb{E}_0 = \underbrace{1}_\text{first roll} + \mathbb{E}_1 \]
\[ \mathbb{E}_1 = \underbrace{\frac{1}{6}(1)}_\text{matched last roll} + \underbrace{\frac{5}{6} (1 + \mathbb{E}_1)}_\text{didn't} \\ \mathbb{E}_1 - \frac{5}{6}\mathbb{E}_1 = \frac{1}{6} + \frac{5}{6} \\ \frac{1}{6}\mathbb{E}_1 = 1 \\ \mathbb{E}_1 = 6 \\ \mathbb{E}_0 = 1 + \mathbb{E}_1 = 7 \]
n_sim <- 1000
sim_game <- function(sides = 6){
won <- FALSE
last_roll <- 999
n_rolls <- 0
while(!won){
roll <- sample(1:6, size = 1) # die roll
n_rolls <- n_rolls + 1
if (roll == last_roll){
return(n_rolls)
} else {
last_roll <- roll
}
}
}
out <- sapply(1:n_sim, function(x) sim_game())
mean(out)[1] 7.27
“Easy”
Sometimes Fast
Not always fast
Not always exact
Graph of states + transitions.
Markov Property: Future state depends only on current state (efficient…but can cause problems)
\[ P(X_n = x | X_1 = x_1, X_2 = x_2, ..., X_{n-1}= x_{n-1}) = P(X_n = x | X_{n-1}= x_{n-1}) \]
\[ A = \begin{bmatrix} 0.2 & 0.3 & 0.5 \\ 0.6 & 0 & 0 \\ 0.2 & 0.7 & 0.5 \end{bmatrix} \]
A steady state or stationary distribution occurs when \(xA = x\)
\[ A = \begin{bmatrix} 0.2 & 0.3 & 0.5 \\ 0.6 & 0 & 0 \\ 0.2 & 0.7 & 0.5 \end{bmatrix} \]
\[ p(x)T(y|x) = p(y)T(x|y) \,\,\, \forall_{x,y} \Rightarrow \\ \sum_{x}p(x)T(y|x) = p(y)\sum_{x}T(y|x) = p(y) \Rightarrow \\ pT = p \]
Problem: sample from \(f(x)\), but don’t have PDF/CDF, only know it’s shape \(p(x)\)
\[ p(x) \propto f(x) \]
Solution: sample from \(g(x)\) and only accept samples according to how likely they are in \(p(x)\)
Proof:
\[ D(x | A) = \frac{P(A|x)D(x)}{P(A)} = \frac{\frac{f(x)}{M*g(x)} g(x)}{P(A)}\\ P(A) = \int g(x) \frac{f(x)}{M*g(x)} dx = \frac{1}{M} \int f(x)dx = \frac{NC}{M} \\ D(x | A) = \frac{f(x)/M}{NC/M} = \frac{f(x)}{NC} = p(x) \]
Cons: samples are uncorrelated, we throw away information about a good value we find, can be inefficient. Hard to choose \(g(x)\)
Enter MCMC! 💡 Next sample depends on previous sample.
Markov Chain: we’re using a markov process to generate proposed samples
Monte Carlo: we’re drawing random samples to estimate a distribution
We want a markov chain whose steady state is the distribution \(f(x)\).
❓ Do MPs converge to their steady state right away?
Goal: find a markov chain whose steady state (after a burn-in period) is \(p(x)\)
👑 King Markov wants to visit his 5 islands with populations 100, 200, 300, 400, 500
Metropolis Islands
Requirements:
visit islands proportional to population
no tracking islands/populations over time
randomness/spontaneity of visits
❓ How
💡 How would you choose \(J\) and \(A\)?
\(J\): any random process where \(J(a | b) = J(b | a)\)
\(A\): \(min \left (1,\frac{pop(\color{blue}{\text{island}_n})}{pop(\color{red}{\text{island}_{n-1}})} \right )\)
markov_step <- function(island){
islands <- LETTERS[1:5]
islands_pop <- list(A = 100,
B = 200,
C = 300,
D = 400,
E = 500)
proposed_i <- sample(islands[islands != island],
size = 1)
accept_prob <- min(1,
islands_pop[[proposed_i]]/islands_pop[[island]])
u <- runif(1)
accepted <- u <= accept_prob
value <- ifelse(accepted,
proposed_i,
island)
value
}
markov_step("A")[1] "C"
n_sim <- 10000
isl = "A"
track <- c(isl)
for (i in 1:n_sim){
isl <- markov_step(isl)
track <- c(track, isl)
}
ggplot(data = data.frame(track),
aes(x = track, fill = track)) +
geom_bar(color = "black") +
theme_minimal() +
theme(panel.grid = element_blank(),
legend.position = "none") +
labs(x = "Island",
y = "Nights Spent")chain: sequence of draws from a distribution
markov chain: the next state in the chain depends only on previous state
draw: one sample/state from the chain
\[ A(x_n \rightarrow x_*) = min \left (1, \frac{f(x_n)}{f(x_*)} \times \frac{q(x_n | x_*)}{q(x_* | x_n)} \right ) \]
\(\frac{q(x_n | x_*)}{q(x_* | x_n)}\) corrects for bias in which values the proposal distribution will generate.
when \(q(x_n | x_*) >q(x_* | x_n)\) the correction makes us more likely to go to \(x_*\)
\[ A(x_n \rightarrow x_*) = min \left (1, \frac{f(x_n)}{f(x_*)} \times \frac{q(x_n | x_*)}{q(x_* | x_n)} \right ) \]
\(\frac{q(x_n | x_*)}{q(x_* | x_n)}\) corrects for bias in which values the proposal distribution will generate.
When \(q(x_n | x_*) = q(x_* | x_n)\) the correction goes away, leaving us with King Markov’s method (Metropolis Algorithm).
\(q(x_n | x_*) = q(x_* | x_n)\)
\(q(x_n | x_*) \neq q(x_* | x_n)\)
metropolis_step <- function(x, sigma) {
# generate proposal
1 proposed_x <- rnorm(1, mean = x, sd = sigma)
# calculate A
2 accept_prob <- min(1, target(proposed_x) / target(x))
# accept/reject
3 accepted <- runif(1) <= accept_prob
value <- ifelse(accepted,
proposed_x,
x)
value
}code adapted from: Navarro, Danielle. 2023. “The Metropolis-Hastings Algorithm.” April 12, 2023. https://blog.djnavarro.net/posts/2023-04-12_metropolis-hastings/.
Tons of new MCMC algorithms. But the cool ones use gradients to choose samples.
Many SOTA use Hamiltonian Monte Carlo